Skip to main content

All Questions

1vote
1answer
158views

Why are these two implementations of the $\epsilon$-greedy policy different?

According to the book Reinforcement Learning An Introduction, the epsilon greedy policy can generally implemented as: $$ \pi(a|s) = \begin{cases} \frac{\epsilon}{|A|} + 1 - \epsilon & \text{if } ...
kklaw's user avatar

close